Patchwork Kriging for Large-scale Gaussian Process Regression

نویسندگان

Chiwoo Park

Daniel W. Apley

چکیده

This paper presents a new approach for Gaussian process (GP) regression for large datasets. The approach involves partitioning the regression input domain into multiple local regions with a different local GP model fitted in each region. Unlike existing local partitioned GP approaches, we introduce a technique for patching together the local GP models nearly seamlessly to ensure that the local GP models for two neighboring regions produce nearly the same response prediction and prediction error variance on the boundary between the two regions. This effectively solves the well-known discontinuity problem that degrades the boundary accuracy of existing local partitioned GP methods. Our main innovation is to represent the continuity conditions as additional pseudo-observations that the differences between neighboring GP responses are identically zero at an appropriately chosen set of boundary input locations. To predict the response at any input location, we simply augment the actual response observations with the pseudo-observations and apply standard GP prediction methods to the augmented data. In contrast to heuristic continuity adjustments, this has an advantage of working within a formal GP framework, so that the GP-based predictive uncertainty quantification remains valid. Our approach also inherits a sparse block-like structure for the sample covariance matrix, which results in computationally efficient closed-form expressions for the predictive mean and variance. In addition, we provide a new spatial partitioning scheme based on a recursive space partitioning along local principal component directions, which makes the proposed approach applicable for regression domains having more than two dimensions. Using three spatial datasets and three higher dimensional datasets, we investigate the numerical performance of the approach and compare it to several state-of-the-art approaches.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hierarchical Mixture-of-Experts Model for Large-Scale Gaussian Process Regression

We propose a practical and scalable Gaussian process model for large-scale nonlinear probabilistic regression. Our mixture-of-experts model is conceptually simple and hierarchically recombines computations for an overall approximation of a full Gaussian process. Closed-form and distributed computations allow for efficient and massive parallelisation while keeping the memory consumption small. G...

متن کامل

Beyond Classification – Large-scale Gaussian Process Inference and Uncertainty Prediction

Due to the massive (labeled) data available on the web, a tremendous interest in large-scale machine learning methods has emerged in the last years. Whereas, most of the work done in this new area of research focused on fast and efficient classification algorithms, we show in this paper how other aspects of learning can also be covered using massive datasets. The paper1 briefly presents techniq...

متن کامل

Fast Gaussian Process Regression using KD-Trees

The computation required for Gaussian process regression with n training examples is about O(n) during training and O(n) for each prediction. This makes Gaussian process regression too slow for large datasets. In this paper, we present a fast approximation method, based on kd-trees, that significantly reduces both the prediction and the training times of Gaussian process regression.

متن کامل

laGP: Large-Scale Spatial Modeling via Local Approximate Gaussian Processes in R

Gaussian process (GP) regression models make for powerful predictors in out of sample exercises, but cubic runtimes for dense matrix decompositions severely limits the size of data—training and testing—on which they can be deployed. That means that in computer experiment, spatial/geo-physical, and machine learning contexts, GPs no longer enjoy privledged status as modern data sets continue ball...

متن کامل

KNN-based Kalman filter: An efficient and non-stationary method for Gaussian process regression

The traditional Gaussian process (GP) regression is often deteriorated when the data set is large-scale and/or non-stationary. To address these challenging data properties, we propose a K-Nearest-Neighbor-based Kalman filter for Gaussian process regression (KNN-KFGP). Firstly, we design a test-inputdriven KNN mechanism to group the training set into a number of small collections. Secondly, we u...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1701.06655 شماره

صفحات -

تاریخ انتشار 2017

Patchwork Kriging for Large-scale Gaussian Process Regression

نویسندگان

چکیده

منابع مشابه

Hierarchical Mixture-of-Experts Model for Large-Scale Gaussian Process Regression

Beyond Classification – Large-scale Gaussian Process Inference and Uncertainty Prediction

Fast Gaussian Process Regression using KD-Trees

laGP: Large-Scale Spatial Modeling via Local Approximate Gaussian Processes in R

KNN-based Kalman filter: An efficient and non-stationary method for Gaussian process regression

عنوان ژورنال:

اشتراک گذاری